Personal Sense and Idiolect: Combining Authorship Attribution and Opinion Analysis
نویسندگان
چکیده
Subjectivity analysis and authorship attribution are very popular areas of research. However, work in these two areas has been done separately. Our conjecture is that by combining information about subjectivity in texts and authorship, the performance of both tasks can be improved. In the paper a personalized approach to opinion mining is presented, in which the notions of personal sense and idiolect are introduced; the approach is applied to the polarity classification task. It is assumed that different authors express their private states in text individually, and opinion mining results could be improved by analyzing texts by different authors separately. The hypothesis is tested on a corpus of movie reviews by ten authors. The results of applying the personalized approach to opinion mining are presented, confirming that the approach increases the performance of the opinion mining task. Automatic authorship attribution is further applied to model the personalized approach, classifying documents by their assumed authorship. Although the automatic authorship classification imposes a number of limitations on the dataset for further experiments, after overcoming these issues the authorship attribution technique modeling the personalized approach confirms the increase over the baseline with no authorship information used.
منابع مشابه
Idiolect-based Identity Disclosure and Authorship Attribution in Web-based Social Spaces
In this paper, we inspect new possible methods of Web surveillance combining web mining with sociolinguistic and semiotic related knowledge of human discourse. We first give an overview of telecommunication surveillance methods and systems, with focus on the Internet, and we describe the legal issues involved in Web or Internet communications investigations. We put the emphasis on identity disc...
متن کاملIdentifying subjective statements in news titles using a personal sense annotation framework
Subjective language contains information about private states. The goal of subjective language identification is to identify that a private state is expressed, without considering its polarity or specific emotion. A component of word meaning, "Personal Sense", has clear potential in the field of subjective language identification, as it reflects a meaning of words in terms of unique personal ex...
متن کاملCan Anonymous Posters on Medical Forums be Reidentified?
BACKGROUND Participants in medical forums often reveal personal health information about themselves in their online postings. To feel comfortable revealing sensitive personal health information, some participants may hide their identity by posting anonymously. They can do this by using fake identities, nicknames, or pseudonyms that cannot readily be traced back to them. However, individual writ...
متن کاملAuthorship Attribution Using Text Distortion
Authorship attribution is associated with important applications in forensics and humanities research. A crucial point in this field is to quantify the personal style of writing, ideally in a way that is not affected by changes in topic or genre. In this paper, we present a novel method that enhances authorship attribution effectiveness by introducing a text distortion step before extracting st...
متن کاملClustering by Authorship Within and Across Documents
The vast majority of previous studies in authorship attribution assume the existence of documents (or parts of documents) labeled by authorship to be used as training instances in either closed-set or open-set attribution. However, in several applications it is not easy or even possible to find such labeled data and it is necessary to build unsupervised attribution models that are able to estim...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010